Open source optical character recognition for historical research

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Optical Character Recognition and Machine Translation of Historical Documents

Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how...

متن کامل

Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study

Optical character recognition (OCR) method has been used in converting printed text into editable text. OCR is very useful and popular method in various applications. Accuracy of OCR can be dependent on text preprocessing and segmentation algorithms. Sometimes it is difficult to retrieve text from the image because of different size, style, orientation, complex background of image etc. We begin...

متن کامل

Optical Character Recognition

This paper describes two implementations in optical character recognition using template matching method and feature extraction method followed by support vector machine classification. With proper image preprocessing, the texts are segmented into isolated characters and the correlations between a single character and a given set of templates are computed to find the similarities and then ident...

متن کامل

Optical Character Recognition Systems

Abstract Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. This chapter presents the basic ideas of OCR needed for a better understanding of the book. The chapter starts with a brief background and history of OCR systems. Then the di...

متن کامل

Optical Character Recognition

In this paper we present for the first time, the development of a new system for the off-line optical recognition of the characters used in the Orthodox Hellenic Byzantine Music Notation, that has been established since 1814. We describe the structure of the new system and propose algorithms for the recognition of the 71 distinct character classes, based on Wavelets, 4-projections and other str...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Documentation

سال: 2012

ISSN: 0022-0418

DOI: 10.1108/00220411211256021